Goto

Collaborating Authors

 East Java


CVQA: Culturally-diverseMultilingual VisualQuestionAnsweringBenchmark

Neural Information Processing Systems

Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data.


AquaFusionNet: Lightweight VisionSensor Fusion Framework for Real-Time Pathogen Detection and Water Quality Anomaly Prediction on Edge Devices

arXiv.org Artificial Intelligence

Abstract--Evidence from many low-and middle-income regions shows that microbial contamination in small-scale drinking-water systems often fluctuates rapidly, yet existing monitoring tools capture only fragments of this behaviour . Microscopic imaging provides organism-level visibility, whereas physicochemical sensors reveal short-term changes in water chemistry; in practice, operators must interpret these streams separately, making real-time decision-making unreliable. This study introduces AquaFusionNet, a lightweight cross-modal framework that unifies both information sources inside a single edge-deployable model. Unlike prior work that treats microscopic detection and water-quality prediction as independent tasks, AquaFusionNet learns the statistical dependencies between microbial appearance and concurrent sensor dynamics through a gated cross-attention mechanism designed specifically for low-power hardware. The framework is trained on AquaMicro12K, a new dataset comprising 12,846 annotated 1000 micrographs curated for drinking-water contexts, an area where publicly accessible microscopic datasets are scarce. Deployed for six months across seven facilities in East Java, Indonesia, the system processed 1.84 million frames and consistently detected contamination events with 94.8% mAP@0.5 and 96.3% anomaly-prediction accuracy, while operating at 4.8 W on a Jetson Nano. Comparative experiments against representative lightweight detectors show that AquaFusionNet provides higher accuracy at comparable or lower power, and field results indicate that cross-modal coupling reduces common failure modes of unimodal detectors, particularly under fouling, turbidity spikes, and inconsistent illumination. All models, data, and hardware designs are released openly to facilitate replication and adaptation in decentralized water-safety infrastructures. Safe drinking water is a prerequisite for public health, yet it remains out of reach for a substantial fraction of the global population. Recent estimates from the WHO/UNICEF Joint Monitoring Programme indicate that 2.2 billion people still lack safely managed drinking-water services and that unsafe water, sanitation, and hygiene (W ASH) contribute to approximately 1.4 million deaths per year [1], [2].



MAD: Manifold Attracted Diffusion

arXiv.org Machine Learning

Score-based diffusion models are a highly effective method for generating samples from a distribution of images. We consider scenarios where the training data comes from a noisy version of the target distribution, and present an efficiently implementable modification of the inference procedure to generate noiseless samples. Our approach is motivated by the manifold hypothesis, according to which meaningful data is concentrated around some low-dimensional manifold of a high-dimensional ambient space. The central idea is that noise manifests as low magnitude variation in off-manifold directions in contrast to the relevant variation of the desired distribution which is mostly confined to on-manifold directions. We introduce the notion of an extended score and show that, in a simplified setting, it can be used to reduce small variations to zero, while leaving large variations mostly unchanged. We describe how its approximation can be computed efficiently from an approximation to the standard score and demonstrate its efficacy on toy problems, synthetic data, and real data.


Structuralist Approach to AI Literary Criticism: Leveraging Greimas Semiotic Square for Large Language Models

arXiv.org Artificial Intelligence

Large Language Models (LLMs) excel in understanding and generating text but struggle with providing professional literary criticism for works with profound thoughts and complex narratives. This paper proposes GLASS (Greimas Literary Analysis via Semiotic Square), a structured analytical framework based on Greimas Semiotic Square (GSS), to enhance LLMs' ability to conduct in-depth literary analysis. GLASS facilitates the rapid dissection of narrative structures and deep meanings in narrative works. We propose the first dataset for GSS-based literary criticism, featuring detailed analyses of 48 works. Then we propose quantitative metrics for GSS-based literary criticism using the LLM-as-a-judge paradigm. Our framework's results, compared with expert criticism across multiple works and LLMs, show high performance. Finally, we applied GLASS to 39 classic works, producing original and high-quality analyses that address existing research gaps. This research provides an AI-based tool for literary research and education, offering insights into the cognitive mechanisms underlying literary engagement.


Enhancing Poverty Targeting with Spatial Machine Learning: An application to Indonesia

arXiv.org Machine Learning

This study leverages spatial machine learning (SML) to enhance the accuracy of Proxy Means Testing (PMT) for poverty targeting in Indonesia. Conventional PMT methodologies are prone to exclusion and inclusion errors due to their inability to account for spatial dependencies and regional heterogeneity. By integrating spatial contiguity matrices, SML models mitigate these limitations, facilitating a more precise identification and comparison of geographical poverty clusters. Utilizing household survey data from the Social Welfare Integrated Data Survey (DTKS) for the periods 2016 to 2020 and 2016 to 2021, this study examines spatial patterns in income distribution and delineates poverty clusters at both provincial and district levels. Empirical findings indicate that the proposed SML approach reduces exclusion errors from 28% to 20% compared to standard machine learning models, underscoring the critical role of spatial analysis in refining machine learning-based poverty targeting. These results highlight the potential of SML to inform the design of more equitable and effective social protection policies, particularly in geographically diverse contexts. Future research can explore the applicability of spatiotemporal models and assess the generalizability of SML approaches across varying socio-economic settings.


NusaAksara: A Multimodal and Multilingual Benchmark for Preserving Indonesian Indigenous Scripts

arXiv.org Artificial Intelligence

Indonesia is rich in languages and scripts. However, most NLP progress has been made using romanized text. In this paper, we present NusaAksara, a novel public benchmark for Indonesian languages that includes their original scripts. Our benchmark covers both text and image modalities and encompasses diverse tasks such as image segmentation, OCR, transliteration, translation, and language identification. Our data is constructed by human experts through rigorous steps. NusaAksara covers 8 scripts across 7 languages, including low-resource languages not commonly seen in NLP benchmarks. Although unsupported by Unicode, the Lampung script is included in this dataset. We benchmark our data across several models, from LLMs and VLMs such as GPT-4o, Llama 3.2, and Aya 23 to task-specific systems such as PP-OCR and LangID, and show that most NLP technologies cannot handle Indonesia's local scripts, with many achieving near-zero performance.


LLM Distillation for Efficient Few-Shot Multiple Choice Question Answering

arXiv.org Artificial Intelligence

Multiple Choice Question Answering (MCQA) is an important problem with numerous real-world applications, such as medicine, law, and education. The high cost of building MCQA datasets makes few-shot learning pivotal in this domain. While Large Language Models (LLMs) can enable few-shot learning, their direct application in real-world scenarios is often hindered by their high computational cost. To address this challenge, we propose a simple yet effective approach that uses LLMs for data generation and scoring. Our approach utilizes LLMs to create MCQA data which contains questions and choices, and to assign probability scores to the generated choices. We then use the generated data and LLM-assigned scores to finetune a smaller and more efficient encoder-only model, DeBERTa-v3-base by leveraging distillation loss. Extensive experiments on the Massive Multitask Language Understanding (MMLU) benchmark demonstrate that our method improves accuracy from 28.9% to 39.3%, representing a gain of over 10% compared to a baseline finetuned directly on 5-shot examples. This shows the effectiveness of LLM-driven data generation and knowledge distillation for few-shot MCQA.


Assessing the Human Likeness of AI-Generated Counterspeech

arXiv.org Artificial Intelligence

Counterspeech is a targeted response to counteract and challenge abusive or hateful content. It effectively curbs the spread of hatred and fosters constructive online communication. Previous studies have proposed different strategies for automatically generated counterspeech. Evaluations, however, focus on relevance, surface form, and other shallow linguistic characteristics. This paper investigates the human likeness of AI-generated counterspeech, a critical factor influencing effectiveness. We implement and evaluate several LLM-based generation strategies, and discover that AI-generated and human-written counterspeech can be easily distinguished by both simple classifiers and humans. Further, we reveal differences in linguistic characteristics, politeness, and specificity. The dataset used in this study is publicly available for further research.


Predicting Internet Connectivity in Schools: A Feasibility Study Leveraging Multi-modal Data and Location Encoders in Low-Resource Settings

arXiv.org Artificial Intelligence

Internet connectivity in schools is critical to provide students with the digital literary skills necessary to compete in modern economies. In order for governments to effectively implement digital infrastructure development in schools, accurate internet connectivity information is required. However, traditional survey-based methods can exceed the financial and capacity limits of governments. Open-source Earth Observation (EO) datasets have unlocked our ability to observe and understand socio-economic conditions on Earth from space, and in combination with Machine Learning (ML), can provide the tools to circumvent costly ground-based survey methods to support infrastructure development. In this paper, we present our work on school internet connectivity prediction using EO and ML. We detail the creation of our multi-modal, freely-available satellite imagery and survey information dataset, leverage the latest geographically-aware location encoders, and introduce the first results of using the new European Space Agency phi-lab geographically-aware foundational model to predict internet connectivity in Botswana and Rwanda. We find that ML with EO and ground-based auxiliary data yields the best performance in both countries, for accuracy, F1 score, and False Positive rates, and highlight the challenges of internet connectivity prediction from space with a case study in Kigali, Rwanda. Our work showcases a practical approach to support data-driven digital infrastructure development in low-resource settings, leveraging freely available information, and provide cleaned and labelled datasets for future studies to the community through a unique collaboration between UNICEF and the European Space Agency phi-lab.